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ABSTRACT 


An abundance of Data Base Management Systems and 
Query Languages already exist, not to mention these 
which have been, and continue to be proposed. Most 
Data Base Management System surveys focus on the type 
of model used to represent the data, methods of 
access, protection, etc. ThiS paper acquaints the EDP 
Manager with the fundamental differences among the 
more Significant query languages with emphasis on 
those characteristics which should be considered when 
choosing a query language. The term guery language as 
used here has been expanded to include the entire user 
interface to the data base, and encompasses both data 


sSublanguages and stand-alone query languages. 
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I. BACKGROUND 
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Meeeecde DATA BASE - WHAT IS IT? 


A data base is not just a file systen. While a data 
base could include a file system, it 1S much broader in 
scope. In general, an automated file system is a continuous 
group of fixed length records, sequentially ord2red, which 
are accessed through card readers, tape units, and usually 


Slow-speed rotating storage devices. 


The data base came of age with the advent of fast, 
relatively inexpensive random access devices. A data file 
previously tied to and used with a specific aovlication 
program was often unavailable to other users. This meant 
that for each new application a new program would be written 
necessitating a new data file relevant to that application. 
This led to much duplication of data, which, when combined 
With infrequent and inconsistent updating methods, prcecduced 
a predictably large proliferation of redundant, Oren 
outdated, data files. 


Martin { Ref. 1] provides the following definition of a 


data base in contrast to traditional file structures: 


"A data base may be defined as a collection of 
interrelated data stored together with as little redundancy 
aS possible to serve one or more applications in an oftimal 
fashion; the data are stored so that they are independent of 


programs which must use the data; a common and controlled 


a see 
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approach is used in adding new data and in modifying and 


retrieving existing data witain the data base." 


Summarizing, a data base, as compared to a file systen, 
reduces data redundancy, proliferation, and inconsistencies, 
permits shared access, and provides improved data integrity 


and comprehensive data protection. 


Peeeorme hk HISTORY 


Prior to the age or the computer, data was stored and 
controlled in some form of clerical ledger. Thus manual 
Seeeecdet1on cf information was severely restricted by labor 
Seameomand the output canacity per clerk. additionally, such 
Systems were subject to a high error rate and were typically 


redundant. 


OF the early attempts at integrating information 
systems, the one most often nentioned is a project developed 
foeeese Mitre Corporation for the U.S. Air Force Slectronics 
System Division. The outgrowtn of the project was the 
Advanced Data Management System (ADAM), Significant for its 
external data definition facility, which allowed different 


data base applications to use a common retrieval systen. 


Early data base systems employed exclusively low-level 
query languages. As Data Base Management System (DBMS) 
technology has developed, there has been a parallel 
development of query languages, not unlike the evolution of 
high-level programming languages during the development of 


modern computer systems. 


Fry and Sibley [Ref. 2] cite three significant families 


of systems developed in the first decade of DBMS technology: 





the formatted file/GIS family originated at the David Taylor 
Model Basin around 1958; the Bachman/IDS family, an 
Integrated Data Store facility developed at General Electric 
which was noted for its random access storage and high-level 
data manipulation language; and the Postley/MARK IV family 
for the IEBM System/360. 


The Data Base Task Group (DBTG), a CODASYL programming 
language ccmmittee formed to extend COBOL to operate in a 
database environment, made reports in 1969 [Ref. 3], U7 
G2ec. 4), aCe oe | Reiki. & |). These reports qenerally 
approached the data management question on the basis of 
using two separate languages: the Data Definition 
(Description) Languag2 (DDL) and the Data Manipulation 


Language (DML). 


The DBTG reports marked what 1S commonly regarded as the 
beginning of the second generation of Data Base Management 
Systems. Noteworthy examples or query languages in the 
CODASYL/DETG family includ2 DMS 1100 (UNIVAC 1110 series) 
and IDMS (IBM System/360). Paralleling the growth of ODsTG 
Systems, the relational model, first proposed by Codd 
fRef. 6] in 1970, began to receive widespread attention and 
has been the subject of a great deal of academic research 
and debate. 


See ECENT DEVELOPMENTS 


Considering the dominance of IBM in the data processing 
meeld, iat iS hardly surprising that most commercially 
available data basé management software systems today run on 
IBM equipment. These include: Information Management 
System/360 (IMS/360), released by IBM in 1969; ATABAS, 
released bv Software AG in Ws ere EDMS (Cis naire 


Aff 





Corporation, 1973); System 2000 (MRI Systems, 1970); and 
hogan (Cincom Systems, 1971). 


While the substance of the commercial market today 
remains in the realm of the CODASYL/IMS network/hierarchical 
system approach to DBMS, significant effort in recent years 
has been devoted to relational data bases. A notable effort 
in this area is the Interactive Graphics and Retrieval 
System (INGPRES), a PDP-11/40 based hardware configuration 
installed and running on top of the UNIX Operating System at 
mieweumiversity of California, Berkeley [{Refs. 7, ahs 
Another major effort is System R [Refs. 9, 10}, a relational 
implementation develop2d at the IBM San Jose Research Lab. 
System & runs on an IBM/370 using V4¥/370 and provides a 


complete data base management capability. 


Ceo eu MACHINE 


Almost the entire thrust of recent DBMS proposals 
appears to have been in the direction of relational modeis 
underlying a non-procedural query language interface. 
Furthermore, each new proposal would seem to be designed for 
a more casual class of user than its predecessor. This is 
in an effort to make the machine perform ever more of the 
thought processes and improve the efficiency of interaction 
With the user. Obviously, as the machine assumes a greater 
role in the interaction, the user's efficiency increases and 
the costs both in processor time and software escalate. It 
would seen that there must come a point when it will be 
reccegnized that even the most casual user will always be 
required to possess at least a minimal, rudimentary 
kKnewledge of the environment in which his queries are to be 


formulated. 
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In the next chapter, the terminology associated with 
query languages and data bases is presented. Chapter III 
presents proposed criteria for measuring and selecting a 
query language. Chapter IV will present the characteristics 
of some of the better known, more widely used, Or more 
interesting query languages. The final chapter provides 
some further guidance for selecting a query language, 
discusses scme implications of the current trends in DBMS 


and examines some of the prospects for future systems. 


12 





fie Lote NOLOGY 


— > a a ee 


Remo arA FASE MODELS 


Much effort has been expended in comparing the three 
meeeal methods of organizing data by DBMS's. The 
advantages and disadvantages of each are well established. 
While the mcedels are not the central issue here, a few words 
Will be devoted to discussion of them in order to provide a 
framework from which to explore alternative methods of 
comparison. Mabe iat acm ome | and Date [{Ref. 11] provide 


additional material on data base nodels. 


1. Network 


In the network approach, typified by systems of the 
CODASYL/DBTG family, record occurrenceS are represented as 
nodes of a network, chained together by named, directed 
anes: The arcs present logical links between the entities 
which can be traversed in the specified direction in order 


to navigate through the data base. 


Ze Hier anchy 


— ee ee eg 


A restricted form of the network approach is the 
hierarchical model in which record cccurrences are 
represented as nodes of a tree ina strictly owner-member 


(or more traditionally parent-child) relationship. 
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Sa Relational 
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In the relational model data is viewed as a group of 
tables or flat files (relations). Each table 1s composed of 
rows (tuples). The order of the columns (attributes) within 
tables is of no significance and no hierarchical or graphic 


relationship exists among the tables containing the data. 


peer A INDEPENDENCE 


General users of a data base do not want to be and 
should not have 158) be concerned elated data base 
lmplementaticn details such as access methods, character 
representaticn, or a host of other physical implementation 
and operating system particulars. All they need is a "view" 
of the data that will allow them to formulate queries and 
Manipulate data. These users desire an "independence" fron 


implementaticn details. 


These details of access method, Character 
representation, floating-point and integer representation, 
pointers, and record blocking are referred to as the 
physical structure of a data base. Freedom from the storage 
and access details gives the user "physical data 
independence", What the user is provided in the place of a 
physical view is a "logical view" of the data. Furthermore, 
it is often advantageous to provide different users. with 
individually tailored logical views of the data. To meet 
this need and to give the system added flexibility, the 


Following general approach is normally taxen. 


A system logical view of the data, termed a schema, is 


14 
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defined. For hierarchical and network based systems the 
schema describes the relationship between record types and 
specifies the contents of record fields (data items). 
Sumetarly, it describes the structure of the relations in 
relational systems. Subschemas are then defined which give 
each user, or group of users their own logical view of the 
data. Thus, users are provided with "logical data 


independence", 


A system that provides true physical data independence 
would allow the physical storage and access details to be 
Changed without affecting the logical structure of the data 
(schemas and subschemas). True logical data independence 
exists only when logical changes can be made to the data 
base without significantly affecting the ovrograms which 
access it [Ref. 2}. With one recent exception, Apple 
eeee 12), which is discussed in Chapter IV, logical data 
independence is currently more of a goal of a data base tnan 


aeemecacteristic. 


C. QUERY LANGUAGE D&FINITION 


mew data definition facility must exist to translate tne 
schema and subschemas into a form usable by the data base 
system. A data manipulation facility 1s required to allow 
data in the system to be deleted, changed, and manipulated. 
The data definition facility, or Data Definition Language 
(DDL), describes the details and content of the schema and 
sSubschema tc the system. The data definition language may 
be a separate language available only to the Data Base 
Administrator (DBA), or it may be an extension of an 
application programming language or query language. There 
May, in fact, be two DDL's: one to define the schema and 


another to define subschenas. Alternatively, portions of 
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the data definition facility used for defining subschemas 
May exist as an extension to a query language; for example, 
the DEFINE VIEW statement of "embedded" SEQUEL allows the 
user to create a view against which he may issue queries or 


define other views [Ref. 9]. 


The data manipulation language in addition to having a 
Glerty Capability, characteristically provides the facility 
to update, create, and remove data base entities. Other 
MeesaeorsS typically include COUNT, SUM, MAX, MIN, AVG, 


Betondl Operators, boolean operators, and Ine USO 
operators. Lixe the data definition facility, the data 
Manipulation facility may be an extension of a host 


apolication programming language; in such cases it is 
rezerred to as a data sublangquage (DSL) and is said to be 
"embedded" in the host language. The data manipulation 
facility may also exist as a stand-alone query ianguage 
through which the uSer interacts directly with the DBMS; 
some authors [Ref. 10] limit the use of the term "query 
language" to languages of the stand alone variety. No 
distinction will be made here between a DSL and a query 
language except to point out that the latter is generally, 
though not necessarily, less procedural (this term will be 
defined later). The term query Language will be used to 
refer to both, and is simply defined as the user interface 


to the data Lase. 


In evaluating general purpose programming ianguages, 
consideraticn is normally given to the following: syntactic 


Sillanrity, data structures, control structures, operators, 


efficiency Of program execution, and, more recently, 
efficiency cf program design, efficiency of problen 
Solution, and compatability with top-down programming 


techniques {Ref. 13]. To use these same characteristics to 


examine query languages amounts to looking at query 
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languages only through the eyes of a programmer. 
Application programmers would probably be quite content with 
the "procedural" query languages selected using these 
criteria; however, everyone who wishes to interface directly 


with a data tase is not a programmer. a 
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Poona ont LANGUAGE COMPATIBILITY 
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Having recognized the importance of identifying the 
various user groups, this chapter defines five class¢es of 
user. The differences among these classes and the changing 
relative importance of them establish the need for measures 
of query languages. Two guantitative and three qualitative 


measures are proposed. 


Mee GAR OOES OF USfRS 


Codd [{Ref. 14] divides the users of data pases into 


five classes. 


The system analysts are responsible for maintaining 
the data base management system, a function which includes 


creating or altering logical views of the data. 


The application programmer serves as the middle-man 
for most of today's data processing needs; his function 
Should be limited to designing and optimizing frequently 
executed routine queries or those queries that are 
inappropriate for more non-procedural query languages (more 
Will be said about this later). 


18 





Mhais Gqroup aneludes bank tellers and insurance 
company clerks who use the data base to answer routine 
queries on a random basis. The needs of this group are 
structured in nature, allowing most of their queries to be 
formalized. An example might be, "What is the balance of 


James M. Simpson's checking account?" 


4. Researchers 


tee = ape 


This class of users is quite diverse but their 
queries could probably b2 characterized as being ad hoc and 
requiring aggregate results. Users in this group are most 
likely willing to wait a few hours or even days for an 


amowel . 


ecactial User 


Some authors [Ref. 15] seem willing to extend this 
term to include almost anyone; it is not unlikely that by 
the 1990's this may well be justified. However, at present 
a practical need is seen to limit this term to users such as 
Managers, lawyers, analysts, accountants, and planners. 
These people need the information in the data base to help 
them make decisions but prefer not to encounter the expense 
or experience the delay in going through a third party, 


perhaps an application programmer, to process their queries. 


Not only is there a varied group who use, or would like 


ZO tee ract With a “data base, but the distribution of 
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gnteractions by the five classes 1S changing. In the early 
seventies, most data base interactions were by application 
programmers. In the next two decades the number of 
interactions by this group is expected to be 1¢&ss 
significant; a corresponding increase in the significance of 
interactions by on-line job-trained users seems imminent. 
But, by far, the casual user class seems to be the emerging 


dominant force. 


Why is the role of the casual user on the rise? An 
increasingly large nunber of people have recognized the 
value of the data base and the Lest opportunity costs that 
result from not being able to interface with it at a level 
in which the machine oroperly compliments man's decision 
Maxing. In many instances availability of the proper query 
language could eliminate the need to go through the 
application programmer to answer a query, which seems 
particularly desirabie in an era of increasing manpower 
costs. In many data base systems, such as military conmand 
and control systems, having the ability to answer queries as 
close as pcssible to the level at which critical decisions 
are made can be or particular importance. Some commercial 
vendors view developing the casual user market as a matter 
Greocurvival [Ref. 16]. 


Note that the need to cater to those at the casual end 
of the user spectrum in no way implies that the languages 
that are used comfortably by the more computer oriented 


users do not and will not continue to play an important role 


in data base system interface. 


Beeeetue NEED FOR NZASURES 


@hat are the differences which exist ameng the classes 
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of user that affect the type of query language with which 
they would be comfortable? There are certainly differences 
in the understanding of how a computer works, internal data 
representaticns, operating SYStCeRS, and yrogramming 
experience. The latter is of particular importance since 
non-programmers may not be acquainted with Such notions as 
data structures, looping, branching, and program efficiency. 
There are quite likely differing levels of mathematical 
SOpnisticaticn. Differences in the number of times a user 
interacts with a data base and the interval between 
interactions are Significant as an indication of the amount 
of time and effort a user can devote to learning a query 
language. The infrequent user may experience difficulty 


retaining syntactic details of some query languages. 


In Short, these differences point to the need for query 
languages ccmpatible with users of varying qualifications 
and varying needs. This may imply having several query 
languages running concurrently on a data base. The more 
casual users do not desire to know and should not be 
reguired to learn the structure of the underlying data 
feel, access methods, Of programming control structures. 
Query languages giving users freedom from these details will 
require the machine to play a greater role in the 


Man-machine symbiosis. 


C. MEASURES OF QUERY LANGUAGES 


With the need well established, qualitative and 


quantitative measures of query languages are presented. 





1. Quantitative Measures 


Level and completeness are considered quantitative 
Measures. It is not essential that an evaluator apply these 
measures directly to a language being evaluated; it is, 
however, recommended that the concepts embodied in them be 
at least subjectively applied. References 17 and 18 discuss 


other measures of "software physics". 


a. Level 


Level [Retft. 17] 1S a quantitative mne2asure of 
the amount of deciSion making that goes into the formation 
of a program to solve a problem. The number of decisions 
required to solve a given problem can be profoundly affected 
py the language being used. The user may de forced to make 
Many decisions concerning syntax, delimiters, operators, 
Stee watch are of little or no significance to the problen 
itself. Specifically, level is a mathematically derived 
yalue, between zero and one, based on the number of 
Operators and operands used in the most efficient solution 


of a problem in a language. 


As an example (though perhaps extreme and 
certainly a misuse of COBOL) of the difference in level, 
consider the following problem: program in COBOL and in 
Peau the Matrix multiplication of matrices A and B. The 
result in APL is A+.xB. The amount of code generated to 
solve the same problem in COBOL is obviously much greater. 
In this example APL would have a level close to one while 


COBOL would have a level close to Zero. 


An evaluator may want to generate some benchmark 





algorithm to actually determine the level of languages under 
consideration. Those languages that consistently have low 
feyels will generally prove to be procedural in nature, nave 
lower query design efficiency, hold the user responsible for 
exception or error checking, depend on the user for insuring 
efficiency of execution; and are thus less suited to the 
casual user. The opposite characteristics are generally 


_ SS 


true for languages yielding high values for level. 


One word of caution is in order. Languages that 
yield a nign value for level do not necessarily relieve the 
Weer OL the responsibility for execution efficiency. APL is 
a case 1n point; the order of operation may have a profound 


effect on execution efficiency. 


b. Completeness 


Completeness refers to the selection capability 
of a query language, independent of any host language in 
which it may be embedded. A complete query language allows 
an authorized user to extract any data item that is 
semantically contain23d Within a data _ base. Actually 
completeness is not a quantitative measure, but is included 
here because of its theoretical basis in mathematics. Codd 
{[Rer. 19} established the basis for completeness of 
relational algebra and relational calculus. Thus the 
completeness for any language based on relational algebra or 
relational calculus may be established by determining if it 
permits the expression of any query expressible in the 
relational calculus. Recent work {Ref. 20] suggesting the 
equivalence of the three data models makes it appear 
reasonable Lo attempt to determine completeness’ for 
languages designed for hierarchical and network models. At 
the very least, an intuitive judgment orf completeness should 


be obtained. 
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2. Qualitative Measures 


Mathematical sOpnisticatuon, learnability, and 
procedurality are proposed and discussed as gualitative 


measures cf a query language. 


a. Mathematical Sophistication 


Mathematical sophistication 1s a subjective 
measure of the degree to which a language requires a user to 
be familiar with mathematical concepts, terminology, and 


symbology. 


There is much to be said for languages that have 
a strong theoretical foundation in mathematics being used as 
target languages for user-oriented source languagés. 
Relational algebra, discussed later, is often used for this 
purpose. The degree to which the relational algebra 
operators (projection, EeStEletLon, Ste.)o, used to 
Manipulate data, are visible in the source language is an 
indication of the amount of mathematical background reguired 
by the user. While terms such as restriction and projection 
are not well known, the actions accomplished by them are 
certainly more natural than that of an algorithm which uses 
a sequence of operations acting on one element at a time to 
select data. However, when these operations are "visible" 
at the user level, the user must mentally go through the 
Mathematical operations necessary to extract the data. Is 
it reasonable for a casual user, such as a lawyer, to go 
through a mathematical process to formulate a query? Should 


he really have to do more than describe what he wants? 


The presence o£ mathematical terms such as 





"range" and symbols such as ©, GC, A ; V joa ilay spe Other 
indicators of the amount of mathematical sophistication 


required to use a language. 


Languages reguiring little or no mathematical 


sophistication will tend to be non-procedural. 
b. lLearnability 


ime looking at the learnability of a language, 
one should be interest2d in the time and effort required to 
learn a workable subset to answer simple gueries; this 
working set should mest the needs of many users. The 
restrictions and exceptions that must be mastered to compose 
moderately ccmplex queries may be a better indication of 
learnability. The ability of casual users to retain what 
they have learned between infrequent uses must also be 


considered. 


Human factors studies even more rigorous’ than 
those conducted on SQUARE and SEQUEL [Ref. 21} and Query by 
Example {Ref. 22] may be necessary to accurately identify 
elements that affect learnability. These studies support 
the notions that simple concepts should De used and that 
those ccncepts which differ semantically should also differ 
syntactically to avoid causing confusion. Reference 21 
found that two different uses of the term WHERE were 
confusing; this ambiguity was eliminated in a later version 
foeeene langWage [Ref. 9]. Sm@milarlywref. 22 fotind that 
Subjects made mistakes on one-fourth of the occasions when 
they needed to choose between the COUNT, SUM, COMNPACT-COUNT, 
and AVERAGE operators. It was also noted that a significant 
percentage cf users had trouble with the universal 


Miantwitcation constructs - for all,-and there exadsts. A 
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smaller number even had difficulty with the relational 


Operators <, >, S, 2 


Languages that require less mathematical 
background and are non-procedural tend to be more eéasily 
iearned by a broad class of users. Ideally a human factors 
study should be conducted on the specific group for which a 


language is intended. 


The ability of a user to recall what he has 
learned can “be enhanced if the user is given an explicit 
format with which to formulate his query. For e?2xample, in 


SEQUEL [Ref. 16} queries are structured as: 


SEERECT <wWha tt? 
r2oMm <relation name> 
WHERE <GOndit one. 


The user is less likely to recall the operators necessary to 
construct a query in languages containing more freedcnm of 
form. It is possible for an implementation, particularly an 
interactive one, to assist the user by providing prompts 
(such as query formats on a CRT screen) or by providing 
menus. Interactive input devices, such as lightpens, 
tablets, and cursors, that allow the user to specify his 


requirements in the most natural way may help appreciably. 
c. Procedurality 


Several of the measures described above have 
made reference to the procedurality or non-procedurality of 
a language. Many of these neasures have a close correlation 
to the degree of procedurality of a language. Languages 
fPopeealhong a procedural -- non=procedural spectrum just as 


users span a broad spectrum of backgrounds and needs. This 
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spectrum cannot be described in absolute terms. Languages 
fall along this spectrum based on the following factors, 


several of which may apply in varying degrees. 


Languages that require the user to provide 
a detailed access path to locate the desired data are highly 
procedural. In such languages the user must have a thorough 
knowledge of the underlying logical data organization. [In 
hierarchical models the user must literallv start at the 
root node and walk down the tree, node-by-node, until ne 
reaches the required record type. In network models there 
may be several possible access paths (network nodels allow 
records to have more than one varent). Since there is no 
guarantee the user wili pick the optimum access path, the 
efficiency of the query'sS execution may be adversely 
affected. Languages with this characteristic will have a 
low level and generally require that gueries be composed Dy 
application programmers. The efficiency of problem solution 
is, therefore, low. Additionally, the DBA has less’ freedon 
in restructuring the underlying schema; doing so might 
require the rewriting of uSer programs that run on a 
recurring basis. In relational models different relations 
do not have parent-member relationships. Therefore, 
Specifying the logical access path is not characteristic of 


relational query languages. 


(2) User Conduct 


Languages that require the user to program 
detailed handling of each record or tuple characteristicallyv 
have go to, branching, and looping control structures. Once 
the user arrives at a node (in a hierarchical model) he must 
check each record element-by-element to determine if it 


meets the search criteria. The user is also responsible for 
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Hanmam@aeng @xceptions and error Gondgitzons, Naturally, 
languages reguiring element-by-element search are daquite 
procedural, have a low level, and are meant for users with 


programming experience. 


(3) User Specified Logical Order of Operations 





In languages exhibiting the preceeding two 
characteristics the user specifies the sequence in which 
Operators are executed. However, this trait is identified 
separately because there are languages in which the user is 
not required to specifv the access path or rrocess records 
(tuples) individually but which do require him to logically 
apply operators in a specified sequence to locate cata. 
Languages uSing terse relational algebra notation have this 
requirement. The tern logical was used because the svsten 
[emo ptimize the actual order of execution; if not, the user 
@meeects efriciency. Languages falling on this part of the 
spectrum are not limited to application programmers, but 
generally do require a Significant degree of mathematical 


sophistication. 


Language implementations exist in which the 
user does not actually specify the order of operations, but 
where he can Significantly affect it. This feature will be 


discussed further in the examples presented in Chapter IV. 


= —— ae Pe a ee SESS eee — ——_— ESS «ee Se ee ea 


At the extreme procedural end of the 
Spectrum the user must nave detailed knowledge of access 
paths. Toward the center of the spectrum relational 
languages still require knowledge of what attributes are in 


what relations; this knowledge must be used to "navigate 
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memos s frelational beun@aries"' f[Ref. 12]. Network and 
hierarchical languages in the center require knowledge of 
what data item 1S in Specific files. At the non-procedural 
puemene user is required only to know attribute or data iten 
names. Some language implementations accept predefined 
synonyms Or derivatives of actual data names. Suc 


implementations: are said to be "user-friendly." 


Many attempts have been made to give adjectives 
to languages along the svectrum. Very procedural languages 
in which the user must have detailed knowledge of the data 
base in order to specify an element-by~element patn through 
the data base are called "navigational" languages {[ref. 23]. 
Languages alonc the center of the spectrum, generally 
eeceabiting EeGeorsS —2h=es Pana ) 20Ur, are “"prescristiv=" 
languages. Languages at the non-procedural end, simply 
requiring the user to state attribute names, are 
"descriptive". Languages requiring no knowledge of the data 
base are termed "open-ended". The special class of query 
languages which allow the user to specify queries using 
tables, forms, and geometric images are also described using 


these adjectives. 


In contrast to the procedural languages, the 
non-procedural languages reguire little knowledge of the 
data base or its underlying model; require less source code 
(higher level); allow the system to determine access paths; 
allow information to be addressed by content (aS opposed to 
location); require less mathematical sophistication; free 
the user from error handling and execution efficiency 
conSiderations; are more efficient in juicy swe problen 
solution; and are easier to learn and retain. They are 
generally, though not necessarily, interactive. They also 
require another layer of software and thus may require more 


machine time. The extra layer of software gives the DBA 


VES, 





more freedom in changing the logical structure of the data 


base. 


ine is important 6 neve that every 
non-procedural query language must be supported by an 
underlying procedural language. And it 1S reasonable to use 
the criteria presented in Chapter One for selecting general 
purpose programming languages as a starting point for 


evaluating a non-procedural query language. 
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A sampling of query languages is presented in order to 
examine the characteristics discussed in Chapter [III. No 
attempt has been made to limit the sampling to commercially 
available languages since languages were selected on the 
basis Ole Te ahi ability Ke) demonstrate various 
S@umG@eteristics. The first two examples do not ccnform 
Syntactically to any actual languages, but are used to 


represent a large group of navigational languages. 


The sample query used tnroughout the examples is arplied 
to the data kase in Figures 1 and 2. The sample data base 


was extracted from Ref. 10. Tae Sample duery, 


Pie GLST THs ELECTION YEARS IN WHICH A REPUBLICAN 
BROM CALIFORNIA WAS ELSCTED. 


1s solved in all examples. 


A. PROCEDURAL - NETWORK 


The EOlTowang example uses Pigure 2 and is 
representative of languages such as IMs { Reve. 24], 
CODASYL/DBTG [Ref. 4], and IDMS [Ref. 25]. This language 
models a network language, but a hierarchical ovrocedural 
language would employ similar constructs. A typical query 


for Q1 might be formulated as foilows: 


an 





ELECTIONS -WON 


Eisenhower 


Eisenhower 


Kennedy 


Jonnson 


ia On 


Nixon 





mi boLDENTS 


HOME-STATE 


Eisenhower republican 


Kennedy Democrat 


Jchnson Democrat 


Nixon Republican 





Figure i - SAMPE DATA BASE - RBLATIONAL 


= ea 





PReSLDENTS ° 









Johnson Democrat Texas 


Nixon Republican Calif 


456 


Pagure 2 —- SAMPLE DAMAY BASE - NEBRORK 
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On: MOVE cpp llean TO DARTY AND "Calif" To 
Noten Stns ane PRESTDENT 
PEND PR EsLDENT “RECORD 


1£ failure go to error 


LOOP1: FEND Sue wb eR OFS e-EW SET 
1£ none go to error 
PRINT YEAR 
BOOP 2: SEND NE ke Meise OF -P-EW SET 


if ,none go to L00?3 
ee eee SrA 
go to LOCPZ 

oOo, 3; Re ae tk eee meno el Das (a 
1f none "done' 
go to LOOP’ 


The query resemples a program in a simple general purpose 
programming language. Answering a query in this language 
amounts to writing a program that "navigates" through tne 
data element-by-element. Indeed the language was designed 
to be used by programmers. It has low level and reguires 


detailed knowledge of the data base. 


B. PROCEDURAL - RELATIONAL 


This example, using Figure 1, was included to 
demonstrate that low level relational languages such as XRM 
{Ref. 26] and GAMMA-O [{Ref. 27} have charact2ristics in 
common with network/hierarchical procedural languages. 


Those minor differences which do exist are not significant. 


— ee i ee ee 
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Peeeeeee NS FERST PRESIDENTS TUPLE 
VHER EO DART e—) "Reomplican" 
AND HOME-STATE = "Calif" 
Pe ciaiites  Patlmn “no SUuGh, president! 
EOOP1; Save fname 
ELIDe LE GCTIONS=NON TUPLE WHERE 
WINNER-NAME = saved nane 
if failure; return "president exists that did 
NOt win eleetion" 
BOOP 2: PRINT YEAR 
PEND eNS  ) ELECTEON-WON TUPL Y= aH Eas 
WINNER-NAME = saved name 
Lomone: =Go to, LOOPS 
go to LOOP2 
bOCr3: Pi evo tt Pn to hDe NS TUPLE Whee ePARTY = 
Soon can meaivomenOtm — os ledwiee UCa lit" 
iienone done" 
go to LOOP1 


Although there is no access path to specify, the 
programmer must still use the knowledge that the President's 
hame waS commcn to two relations in order to navigate across 


relational boundaries and extract YEAR. 


C. RELATIONAL ALGEBRA 


These languages and the one used in the following 
example are based in set theory. They were proposed by Codd 
when he first introduced the relational model [Ref. 6] and 
later when he further defined a relational algebra and 
Pe iaewondl Calculus, [Rets. 19, 28]. Languages based on 
relational algebra include MACAIMS (Ret. 29], ES7 
fevectrem oo), and RDMS [| ket. 31]. The query O01, using Figure 


25 





© 

Tk 
' 
i: 

C: 
q > 
i 
| 
_ : 
= 
Fg 
q 
4 
- 

@ 
, 

— * 

a Xe | “ 
, : - : ee a 7 i ; ae ae* pude 
.. a — » ote (eae 
_ _ - Ty... a ~*~ - ‘s70jRE DE y aon 


- 
7 / a . 
a a = - 7 [-< 
a7 7? a a -8 3 Sg = Bie 


| 
, 7 . _ : 7 a a he: 77 ese) aor 
| 
| 


1, is presented in the relational algebra followed by a 
Simple description of the operators used. A more conplete 


description of relational operators is provided in Ref. 10. 


#(0) (2[ (1=3) JYELECTIONS-WON* (#(0) (AL (1="Republican" 
& 2="Calif") ]PRESIDENTS) ))) 


Processing begins at the innermost nested ievel. 


Domains are numbered leit to right beginning with zero. 


RESTRICTION (4) of a relation amounts to selecting all tuples 
(cows) that meet the conditions defined by the restriction 


of the relation specified. Thus 


Peo i="2epublican”® © 2="Calis™ ) PRESIDENTS 
selects all tuples from the relation PRESIDENTS in which 
the attribute (column) numder "1" has the value "Republican" 
and attribute number "2" has the value "Calif". The result 
is a new relation, A, which serves as the operand for the 


next operator. A has the value; 


HOME-STATE 


Republican 





PROJECTION (#) extracts specified attributes. The result is 


a new relation. Thus #(0)A selects the name attribute from 
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A yielding B: 





PRemwcr(*=) concatenates its left and right arguments. If N 
Smomm are the cardinalities of the relation, the result is a 
femeerelation having N xX M tuples. ELECTIONS-WON * B yields 


BSisenhower 


Eisenhower 


Kennedy 


Johnson 


Nixon 


Nixon 





To this result a restriction is applied in which WINNER-NAME 


= NAME to eliminate ambiguous information such as the tuple 
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mio geGoltsenhower, 442, Nixon™. The result is: 


Nixon 





A projection yields the final answer: 





Actual implementations might use a more descriptive 


syntax in order to improve readability and learnability. 


The significant difference between this and the two 
preceeding examples is that the user specified what he 
wanted in terms of sets not in terms of individual records. 


The underlying svstem took care of locating the relations, 


COlng element-~by-element processing, including error 
handling and storing intermediate results (perhaps 
virtually). The language is not navigational; it is 
essentially prescriptive. However, the user still had to 


specify the operations to be performed and the order in 
which they were to be performed. Some implementations may 


optimize the query by (1) finding an equivalent re-ordering 





of the operators, (2) performing some operations virtually, 
freusing a different combination of operators, (4) using a 
different set of reiations and attributes, or (5) a 
combination cf the above. Obviously, in many cases the user 


determines or affects the execution efficiency of his query. 


The relational algebra does not require the programming 
skills of the first two examples, but the mathematically 
unsophisticated user might find it cumbersome, at least in 


weseterse form. 
De RELATIONAL CALCULUS 


In the rélaticnal calculus the user specifies what he 
wants using logical operators. The system maps the query 
into an equivalent expression in the relational algebra. 
Relational caiculus may be thought of as restrictions 
followed by projections. QO1 stated in the relational 


calculus uses Figure 1: 


{x: 4((y,Republican,Calif)€ Presidents A(x,¥,Z)€ 
ELECTIONS —-WON) } 


The query specifies what is wanted, x; where it cones 
© Dh, Ebae Ll ONS—wWON ; and with what Gualiricatiens. 
Paraphrasing, Select x from (x,y,Z), an element of (é) 
ELECTIONS-WON, where z has any value and (A) y is taken fron 
(y,Republican,Calif), an element of President. X, y, and z 


are variable; Republican and Calif are constants. 


The relational calculus iS prescriptive. The use2r 
relies on his knowledge of what attributes are in what 
relations tc prescribe a solution to his query. It is tne 


system's responsibility to determine the exact sequence of 
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Operations to be used. AS with the relational algebra, it 
is still possible that using some decomposition algorithnas 
the user may have a profound influence on execution 
efficiency. This is particularly true in implementations 
that use join as one of the operators in the underlying 
algebra (Pecherer [Ref. 32] has shown that certain subsets 


of the relational algebra are sufficient). 


As discussed earlier many users have difficulty with tne 
universal and existential qualifiers as well as set 
notation. This could limit the number of users who vould 


feel comfortable with this language. 
ee SUE L 


QUEL [Ref. 33] is typical of languages which are based 
on the relational calculus. Other such languages include 
ALPHA [Ref. 28], GCOLAR OER] feo dae andere (rer. 35a 
These languages do not require the user to apply quantifiers 


@Gemectly. 


QUEL is actually a query language as well as a DSL which 
is embedded in the "C" language and the UNIX operating 


system using the procedural language EQUEL [Ref. 33}. 
mie solution to Q1, uSing Figure 1, is presented: 


RANGE OF fc IS ELECTIONS~-WON 
RANGE OF P 15 Pao IDENTS 
eden EV i. 2a 
“ESRE E.WINNER-NAME = P.NAME 
WHERE P.PARTY = "Republican" AND 
PHONES oraeews = “Cali 
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The RANGE statements specify the relations used in the 
RETRIEVE and WHERE statements. The requirement for the user 
to have detailed knowledge of relational structure 1s 
reflected in the block structure of the language. Here 
again the user 1S prescribing a means to fNavigate across 


relational boundaries. 


me CUPID 


References 36 and 37 describe CUPID (Casual User 
Pictorial Interface Design), which as its name implies is a 
picture~oriented guery language. It is intended for the 
Measual’ user. CUPID contains a high-level, menu-type 
sublanguase which is tne front-end to INGRES, the relational 
data base system supporting QUEL. Additionally, CUPID 
offers a user definition facility which allows the system to 
Niearn" new concepts. CUPID is presented here due to its 
graphic nature, which places it in the "special" category of 


query languages. 


Figure 3 illustrates how Q1 might appear on a_e cathode 
ray tube device. An English language approximation cf the 
query as it is depicted here, would be, "Select and save 
Name from Presidents relation where Party equals Repuklican 
and Home-state equals Calif; select and output Year fron 
Elections-won relation where Winner-name equals saved Name." 
The graphic diagram is drawn as a result of user inputs from 


a keyboard. 


In order for the individual graphic symbols to be 
displayed, the user must select from a menu of shapes 
available, those symbols which are necessary to formulate 
his query, and then through interactive queuing, properly 


position each symbol. It would seem that this would, for 
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all practical purposes, necessitate the user having sketched 
his intended query, or at least have it well-formed in his 
Own mind prior to commencing "construction", There are 
unique shapes available in the menu to represent each cf the 
following entities: relation names, domain names, 
relational operators, arithmetic operators, logical 
operators, constants, q-boxes(designates the target list of 
requested data), and a special symbol to enclose aggregate 


operations. 


PRESIDENT] NAME PARTY HOM Sor hte 
bce 





4 
Kh 


YEAR WINNER-NAME 


Figure 3°= GCURTD OuanyY 


While this query appears fairly straigntforvard, some 
additional comments are in order. CUPID is pt@Scriptive 


Since the user must provide a graphical "prescription" for 
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EA Ty HOMES SLALS 


ce s > 
Republican 


BIg Us W#INNER-NAtic 


Figure 4 - AMBIGUOUS QUERY IN CUPID 


answering a query. It would indeed be difficult ‘to assess 
the learnability of this lancuage without a much more 
thorough examination. Although, one small study revorté 

favorable results {Ref. 37]. It can, however, be observed 
that the menu-selection feature would most surely stimulate 
the inftreguent user's recall. The language requires a 
fairly high-level of mathematical sophistication; even the 
most sinple queries generally cannot be foraulated without 
boolean operators. TULEhCrEMore, Ht 15 not Clear (at least 
to the casual user) whether the query depicted in Figure 4&4 
would yield the same results as that in Figure 3; a 
substantial understanding of the ‘formal syntax 1s also 


required. Considering these factors, it would seem that the 
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language wculd be more appropriate nou the on-line 
job-trained user or perhaps the research category of uSer. 
As for the level and completeness of CUPID, it would be 
expected that both are very near QUEL, as the pictorial 
queries of CUPID are compiled into that prescriptive query 
language. 

Moeagditional, unique feature of CUPID is its definition 
Sapability. There are provisions for both user definitions 
and "learning" (through global definition tables). AS an 
example, Figure 5 depicts how a user might formulate the 
query Q2: "List the names of minority oresidents." The 
vocabulary definition algorithms would ideally resolve both 
problems presented DY this query: (1) derine 
“minority-~president" and (2) resolve the apparent 
Misplacenent of "minority-voresident", an unknown value for 
the wWinner-name domain. The user would eventually be 
required, through a queuing sequence, to provide the maximua 
number of votes which would qualify a member orf the 


winner-name domain as a minority president. 






WINNER-NAME WINNER-VOTES 





Majority 


pigure 5 = OUSER-FPRaPuDLY OUBRY IN CURED 


With some practical experience, it appears that CUPID 
would be an interesting, tidy method of expressing simple 


queries, and almost fun to use. On the other hand, more 
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complicated queries, such as the examples depicted in Ref. 
36, become quite cumbersome and difficult to £cllows 
Nonetheless, its designer should be commended for her unigue 
Sirorts in the area cf pictorial (graphical) queries, her 
success With user vocabulary Getint tom, and the 
contribution to reduction of typographic and spelling errors 
in English-language text input by use of the "menu" 
Maer lity. 


pmourL fRef. 241], an Outgrowth of SQUARE [Ref. 38], is 
ie. ca L on what Chanberlin fRef. 10] cails 
MMapping-criented languadces". Mapping-oriented languages 
Specizy queries by defining a mapping between the desired 
result, which is a relation, and relations which are Known 
to exist in the data base. SEQUEL wasS originally intended 
for interactive problem solving by non-computer specialists. 
System R [Ref. 31] implements SEQUEL both as a stand-alone 
language andas a DSL callable from application programming 
danguages. The SEQUEL Syntax resemoles that of QUEL. Both 
are block structured and use WHERE statements. The SEQUEL 


selution of Q1 uSing Figure 7 is: 


Sooner, LEAR 
FROM ELECTIONS—-WON 
WHERE WINNER-NAME = 
SELECT NAME 
fr ROM PRESSDENTS 
WHERE. PARTY = “Republican” 
MD HOME=STATE = '@a] ane" 


Most of the comments about QUEL are applicabie here. 


SEQURE 1S prescript#¥e. The syntax Of SEQUEL (SELSCT, FROM 
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vice RANGE, RETRIEVE) seems more natural and more learnable 


than that of QUEL (see Ref. 21 for human factors study of 


SeeuUEL). The mathematical sophistication required for 
Simple queries is quite low. More complex queries may 
reguire use of the set operators - union, intersection, and 


aoe aitference between intermediate mappings. For example, 


use the relations in Figure 1 to answer Q3: 


SeeeeriND THE NAMSS OF PRESIDENTS BORN IN TEXAS WHO RECEIVED 
MORE THAN 400 WINNER-VOTES. 


Ah appropriate query might be: 


TLONS-WOh 
NER-VOTE > 400 


=< 


Sieecrl NAMs 
FROM PRESIDENTS 
WHERE HOME-STATE = "Texas" 


Some users may not have the prerequisite background for 


the proper use of the set operators. 


H. QUERY EY EXAMPLE 


Query by Example [Refs. 39, 40] 1s intended to serve the 
needs of the non-programming casual user with Latrle 
mathematical background. It is presented here as an example 
of a category of query languages generally referred to as 
Peorns " . MARK IV [{Ref. 41], intended for batch usaqde, was 
an early attempt at the forms approach. Another significant 
exagple of this approach has been presented by the CODASYL 
End User Facility Task Group (Ref. 42], which is attempting 


to define a “language" +o emulate the naturalness of 





femal y extracting data from "forms" which are familiar to 


the user. 


In Query by Example the user formulates his query by 
displaying Elank tables on the CRT, naming the tables and 
their columns, and filling in the columns to illustrate the 
guery to be answered. The queries are then translated into 


relational calculus for processing. 


mae Query by Example solution to Q1 follows: 











PRESIDENTS PARI Y HoOMt=S LATE 





Wilson Republican Calif 


Republican and Calif are "constant elements" and are not 


underlined. WILSON and 1948 are variables, termed 
Mexample elements", and are designated as such DV 
underlining. Example elements need not be actual elenents 
in the data base. "p." specifies that the element is to be 


retrieved and printed. 
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This language is classified as "prescriptive" because 
the user must prescribe the means to navigate across two 
"tables" in a manner not unlike that required by SEQUEL. 
Reference 22 presents a human factors study of Query by 
Example. As discussed earlier, universal quantification 
presented a problem for some users. O4, taken from 


fRert. 39], 1S a query reguiring universal quantificaticn. 


Preece iND THE NAMES OF SUPPLIERS WHO SUPPLY A JOB LOCATED 
Mien YORK WITH ALL PARIS OF TYPE A. 





ALL Rod means all PART-NAMES of TYPE Ae The dot tet 
indicates that a SUPPLIER mav supply more than parts of 


TYPE Ato a job in New York. 


Pieecomearing thas query with that for relational 


calculus, note how the target language shows through. The 
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problem with universal quantification was also detected aan 
the evaluation of SEQUEL [Ref. 21]. This suggests tnat the 
level of mathematical sophistication required for a language 
may be easily under-estimated, and that the casual user's 


facility of the language may be limited. 


Ze APPLE 
BePube ff Ref. 12] is the language used by a developing 
system which allows the user to specify queries using only 
attribute names. The formulation of Q1 follows: 
SELECT YEAR WHERE PARTY = "Republican" AND 
HCME<STATE = "Calif" 


This language is truly "Ccescriptive"”. The user need not 
imavigate"” or give a "prescription" for navigating across 
relational boundaries; he need not even know what boundaries 
exist, let alone know what attributes transcend their 
boundaries. All that is required is a knowledge of 
attribute names, The system dé@etermines access paths and 
identifies the operators necessary to answer the query. 
APPLE currently has some inplementation problems relating to 
the solution of ambiguities within queries, but, when 
Satisfactorily resolved, could present the casual user with 


a language that is truly Simple to understand and use. 


J. NATURAL LANGUAGE 


The idea of using naturai English as a query language is 
not new or unigue {Ref. 43]. In fact, Simmons [Ref. 44] 
reviewed fifteen experimentai question-answering systens 


more than ten vears ago. 
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Due 168 the inherent complexities of the English 


Meanouage, it 12S not surprising that while significant 


progress has been made, the field remains a virtual 
BEontier. Martin [Ref. 45} has: stated, “The art or 
devising dialogues between men and machines should be 


regarded as a new form of literacy. AS yet the majority of 


Beqetitioners of this art are unquestionably illiterate." 


Since it has, in some sense, evolved as an efficient 
mediun fOr communications interchange between men, it would 
seem that English must sureiv be the ultimate man-machine 
communications medium. The mOtivation 1S sSinmpliv an 
extension of the previous]lv discussed trend in guer 
languages: making the machine do more so that nan exvends 
less erfort in interacting with the machine. Dre. So lat sor 
is not quite as Simply stated. Almost every attempt at 
eapioying natural English as a guery language has actually 


utiiized an extremely restricted subset of the language. 


Instead cf limiting the system to a subset of the 
English language, another approach which is relatively easy 
to implement is the computer-initiated dialogue. The user 
may experience the sensation of communicating with the 
machine ina very natural manner, when in reality the 
machine forces the uSe of a restricted subset of the 
language by initiating the dialogue and requiring the user 
to respond in an unambiguous manner which can be clearly 


understood by the machine. 


To implement a genuine, user initiated dialogue system 


remains a challenge for the following reasons: 
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Free-form input opens a Pandora's pox of potential 


problems with respect to typographic and spelling errors. 
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Some query languages, such as CUPID have reduced the 
Significance of this problem by the implementation of a 


hnenu.e 


2. Syntactic and Lexical Aagbigquities 

Many grammatically correct English language reguests 
are inherently ambiguous. For instance, "List students 
receiving diflomas by sections," can easily be interpreted 
at least two different ways; the “by sections" phrase can be 
applied to mcdify (1) the verb "list" or (2) the phrase 
"receiving diplomas". Codd [Ret.15]}] has proposed a aethod 
of "RENDEZVOUS with tne casual user" which does much to 
resolve this problem by having the machine introduce 
Clarification dialogue limited to the machineé-comprehensible 
subset of the natural language, and by periodically 


restating the user's query. 


Many languages (including CUPID, RENDEZVOUS, and 
IQF) now employ features which allow the user to define 


midynawal terms to suit his own needs. 


There can be no doubt that natural English query 
languages will eventually come into widespread usage. In 
the interim, however, it is not unreasonable to expect the 
casual, but interested, user to familiarize himself with the 
environment of the DBMS and one of the currently available 
prescriptive gquery languages. Surely a clerk who has_' used 
tape-output adding machines for his entire career would be 
Guite bewildered the first time he attempted to use a modern 
hand-held, reverse-polish notation calculator. But once 
acquainted with the machine, and 1ts increased computational 
power, it would probably be difficult to persuade him to 


return to his former environment. 
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Ve PCONCTUSILONS 


A. SUMMARY 


The different classes OF data base users were 
identified. Differences among these users were 
characterized which pointed to the need fcr measures with 
which to evaluate query languages. Quantitative and 
qualitative measures were proposed. Procedurality was shown 


to be a good overall measure or a language. 


Summarizing, procedural languages (1) are intended for 
experienced programmers, (2) require more source code, (3) 
decrease prcgrammer efficiency, (4) give less logical data 
independence, and (5) increas2 machine efficiency (provided 
queries are well designed). The more non-procedural 
languages (1) may be used by a broader spectrum of users, 
(2) require less source code, (3) increase programmer 
Secductuvity, (4) provide increased logical data 
independence, and (5) decrease total problem solution time. 
Thus non-procedural languages increase human productivity at 
the expense of machine time due to the additional layers of 


software. 


et is evident that the user must be a prime 
consideration in selection of a query language. The more 
casual the user the more non-procedural the Language must 
be. "However, there will, no doubt, always be users whose 
interaction rates are so high, whose types of interactions 
are limited and whose data structures change slowly enough 


that they will rationally prefer a procedural svstemn" 
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{ref. 20}. The final choice of language(s) must be made 
Within the context of other realities. One may be limited 
to existing hardware in which case increased machine tine 
may not be available or the desired software interface may 


mot exist. 


Currently, the choice of commercial languages is limited 
to those which are designed for network and hierarchical 
models; it may be some time before a relational model is 
commercially available. The importance of the underlying 
foemel. is likely to diminish in Light of Stonebraker and 
Held's suggestion that the hierarchical and relational 
mcdels are special cases or the network model {Ref. 20], 
and Date's preposed architecture for a single hish-level 


language which supports all three data models [rei. 461. 


Reference 20 suggests another factor that should not De 
overlooked -- non-procedural languages are inappropriate for 


certain queries. For example, consider the guery: 


Os FIND the President receiving the second highest 
SENNER-VOTLS. 


It is unlikely that a non-procedural language processor 


woulda handle 05 efficiently. 


All things considered, most general data base 
implementations will reguire a mix of query languages to 
meet the needs of itS various users, while a single query 


language may suffice for some special purpose data bases. 


lwo ractors: Will ampact oon the DBAS of the ‘future. 


First, the machine's role in the man-machine symbiosis will 


expand beyond providing data which man analyzes and uses to 
make decisions. The machine itself will be programmed to 
access the data base and conduct increasingly higher levels 
of analysis to assist man in making complex decisions. A 
thorough discussion of "decision support systems" is found 
in Ref. 47 and an experimental system, GMIS, 1s the subject 
of Ret. 48. 


Secondly, the arrival of mass storage devices on the 
market and the expected arrival of reasonably ES Lecd 
associative memory hardware will undoubtedly have a prcefoun 
impact on the ruture of DEMS's and query languages. Mass 
storage devices make it practical to store large volumes of 
information on-line. Associative memory will surely serve 
as a catalyst for the develoonment of the Data Base Machine 
(DBM) . 


The DBM will allow data base nanagement functicns, at 
all levels, to be separated from the traditional operating 
Systen. The DBA will not be dependent on the operating 
system's access methods, nor will it rely on the operating 
Sweoeene £0r 1/0 control. It will be a specialized machine 
Within a ccmputer system which provides services ZO 
application processes. The DBM will be able to run 
asynchronously with the cantral processing unit and will 
Make available data stored on secondary storage devices. 
Efficiency of query processing will improve because the DBM 
Will have singular conrtol over access, integrity, and 
protection [Ref. 49]. Freedom from the operating system and 
the use of associative memory (which will allow lcgical 
Beorage €©O be closer in fOrm to physical stotage) will 
further increase efficiency and decrease the machine "costs" 


of catering to the casual uSer. 
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Sec URTHER RESEARCH 


Several areas were discussed having potential for future 
research which could lead to further improvement in the user 
interface. Human factors studies are needed to datermine in 
general what language characteristics area@ compatible with 
each user class; where feasible, studies of specific 
user/flanguage combinations should be conducted. Additional 
research in software engineering couid result in a more 


complete set of guantitative measures of languages. 
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